Search results for "Unsupervised classification"
showing 4 items of 4 documents
Application of the Information Bottleneck method to discover user profiles in a Web store
2018
The paper deals with the problem of discovering groups of Web users with similar behavioral patterns on an e-commerce site. We introduce a novel approach to the unsupervised classification of user sessions, based on session attributes related to the user click-stream behavior, to gain insight into characteristics of various user profiles. The approach uses the agglomerative Information Bottleneck (IB) algorithm. Based on log data for a real online store, efficiency of the approach in terms of its ability to differentiate between buying and non-buying sessions was validated, indicating some possible practical applications of the our method. Experiments performed for a number of session sampl…
Time series clustering with different distance measures to tell Web bots and humans apart
2022
The paper deals with the problem of differentiating Web sessions of bots and human users by observing some characteristics of their traffic at the Web server input. We propose an approach to cluster bots’ and humans’ sessions represented as time series. First, sessions are expressed as sequences of HTTP requests coming to the server at specific timestamps; then, they are pre-preprocessed to form time series of limited length. Time series are clustered and the clustering performance is evaluated in terms of the ability to partition bots and humans into separate clusters. The proposed approach is applied to real server log data and validated with the use of different time series distance meas…
A bibliometric approach to finding fields that co-evolved with information technology
2020
Among the declining industries, for example music industry, some have been revived by information technology (IT). At the same time, in academic fields, some have expected co-evolutions between IT and other fields to cause the resurgence of either field. In this research, the clustering of citation networks with 14,438 academic papers resulted in the identification of 28 academic fields in the areas “Computer Science” or “Information Science and Library Science.” Co-evolutions between these 28 fields and citing fields to the 28 fields were evaluated by an investigation of contents; a methodology to search co-evolutions was also proposed. This paper proposes that pairs of academic fields (wi…
Bot recognition in a Web store: An approach based on unsupervised learning
2020
Abstract Web traffic on e-business sites is increasingly dominated by artificial agents (Web bots) which pose a threat to the website security, privacy, and performance. To develop efficient bot detection methods and discover reliable e-customer behavioural patterns, the accurate separation of traffic generated by legitimate users and Web bots is necessary. This paper proposes a machine learning solution to the problem of bot and human session classification, with a specific application to e-commerce. The approach studied in this work explores the use of unsupervised learning (k-means and Graded Possibilistic c-Means), followed by supervised labelling of clusters, a generative learning stra…